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Abstract — For a broad topic, different users may have 
different search queries while they submit it to Search engines. 
In improving search engine relevance and user experience, the 
inference and analysis of user search goals can be very useful. In 
this paper, First we propose a way to infer user search goals by 
analysing the Feedback Session which is created from Search 
engine Query logs. Second we propose a new metric in 
generating Pseudo documents to better represent the Feedback 
Sessions for clustering. Finally we propose a new criterion to 
evaluate the performance of inferring user search goals. 

Index Terms — User search Queries, Feedback sessions, 
pseudo-documents, Ranking, restructuring search results. 


I. INTRODUCTION 

In online search Applications, different User Search 
queries are submitted to search engines to represent the 
information needs of users. Though, sometimes user search 
queries may not precisely represent users’ specific 
information needs and different users may want to get 
different information when they submit the same query. For 
example, when the query “the sun” is submitted to a search 
engine, some users want to locate the homepage of a United 
Kingdom newspaper, while some others want to learn the 
natural knowledge of the sun, as shown in Fig. 1. Therefore, it 
is essential to capture different user search goals in retrieving 
information .We define user search goals as the information 
on different aspects of a user search query that user groups 
want to find. Information is a user’s particular desire to satisfy 
his/her need. User search goals are considered as the clusters 
of information needs for a search query. 


information needs. At last, we Cluster these pseudo 
documents to infer user search goals and show them with 
some keywords. 
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Fig. 1 The examples of the different user search goals and their 
distributions for the query “cat” by our experiment 

Finally, we propose an evaluation criterion classified average 
precision (CAP) to evaluate the performance of the 
restructured web search results 


The inference and exploration of user search goals can 
have a lot of advantages in improving user experience and 
search engine relevance. 


II. Metric for Inferring User Search Goals 


A. Framework of our approach 


In this paper, our objective is to discover the diverse 
user search goals for a query and showing each goal with 
some keywords automatically. We propose a new approach to 
infer user search goals for a query by clustering our feedback 
sessions. The feedback Session is well-defined as the 
sequences of both clicked and unclicked search URLs along 
with their ranks and ends with the last URL that was clicked in 
a session from user search logs. Then, we propose a new 
optimization method to map feedback sessions with 
pseudo-documents which can efficiently return user 
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Fig. 2 shows the framework of our approach by taking a 
example of the ambiguous Query “The Sun”. It contains two 
parts divided by the dashed line. 

In the upper part, all the feedback sessions of a user search 
query are first extracted from user search logs and mapped to 
Pseudo-documents. Then, user search goals are inferred by 
clustering these pseudo-documents and depicted with some 
keywords. In the bottom part, the original search results are 
restructured based on the user search goals inferred from the 
upper part. Then, we evaluate the performance of 
restructuring search results by our proposed evaluation 
criterion CAP. And the evaluation result will be used as the 
feedback to select the optimal number of user search goals in 
the upper part. 
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Fig 2: Framework of Our Approach 


III. Representation of Feedback sessions 

In this section, we first describe the proposed feedback 
sessions and then we present the proposed pseudo documents 
to represent feedback sessions. 

In this paper, we focus on inferring user search goals for a 
particular search query. Therefore, the single session 
containing only one query is introduced, which differentiates 
from the conventional session. Meanwhile, the Feedback 
session in this paper is based on a single session, though it can 
be extended to the whole session. 


The proposed feedback session consists of clicked URLS, 
unclicked URLs and web page rank of the URLS, ends with 
the last URL that was clicked in a single session. It is 
motivated that before the last click, all the URLs have been 
seen over and evaluated by users. Therefore, besides the 
clicked URLs, the unclicked ones before the last click and 
their corresponding page ranks should be a part of the user 
feedbacks. 

In Fig. 3, the left part lists 10 search results of the query “the 
sun” and the right part is a user’s click sequence and rank where 
“0” means “unclicked.” And the numbers 1,2,3 shows the order 
of clicked URLS. The single session includes all the 10 URLs in 
Fig. 3, while the feedback session only includes the seven URLs 
in the rectangular box. The seven URLs consist of three clicked 
URLs and four unclicked URLs in this example. 
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Search results 

PAGE 

RANK 

Click 

sequence 
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In the first step, we enrich the URLs with addition-al 
textual contents by extracting the titles and snip-pets of the 
returned URLs contained in the feedback session. Finally, 
each URL’s title and snippet are represented by a Term 
Frequency-Inverse Document Frequency (TF-IDF) vector [1], 
respectively, as in 


T — \t t 
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( 1 ) 


Fig. 3. A feedback session in a single session. “0” in click 
sequence means “unclicked.” All the 10 URLs construct a single 
session. The URLs in the rectangular box construct a feedback 
session. 

B. Map Feedback Sessions to Pseudo-Documents 

In this paper, we propose a new way to map feed-back 
sessions to pseudo -documents as illustrated in Fig 6. The 
building of a pseudo -document includes two steps. They are 
described in the following: For example, Fig. 4 shows a popular 
binary vector method to represent a feedback session 
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Fig 4 The binary vector representation of a feedback session. 

For a query, users will usually have some vague keywords 
representing their interests in their minds. They use these 
keywords to determine whether a document can satisfy their 
needs. We name these keywords “goal texts” as shown in Fig. 
5. 



Fig. 5 Goal texts. For a query, different users will have different 
keywords in their minds. These keywords are vague and have no 
order. We name them “goal texts,” which reflect user 
information needs. 

1) Representing the URLs in the feedback session 


Where T ui and S ui are the TF-IDF vectors of the URL’s title 
and snippet, respectively.^ means the ith URL in the feedback 
session. And Wj (j=l,2,...n ) is the jth term appearing in the 
enriched URLs. Here, a “term” is defined as a word or a 
number in the dictionary of document collections. t w j and s W j 
represent the TF-IDF value of the j th term in the URL’s title 
and snippet, respectively. Considering that URLs’ titles and 
snippets have different significances, we represent the 
enriched URL by the weighted sum of T ui and S ui , namely 

+ LL?qS tJ . = [/top /toj V ■ ■ 3 /llj 5 (2) 

where F ui means the feature representation of the ith URL in 
the feedback session, and w t and w s are the weights of the titles 
and the snippets, respectively Then, we stipulate that the titles 
should be more significant than the snippets. Therefore, the 
weight of the titles should be higher and we set w t to be 2 in 
this paper. 


Enriched URLs 
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Fig 6: Illustration for mapping feedback sessions to pseudo - 
documents 

2) Generate pseudo-document based on URL 
representations 

In order to obtain the feature representation of a feedback 
session, we propose an optimization method to combine both 
clicked and unclicked URLs in the feedback session. 
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Let F fs be the feature representation of a feedback session, and 
ffs(w) be the value for the term w. Let F ucm (m=l,2,. . .M) and F uci 

(m=l,2,....M) and (1=1,2,...L) be the feature 

representations of the clicked and unclicked URLs in this 

feedback session, respectively. Let f ucm(w) and ^^be the 
values for the term w in the vectors. We want to obtain such a F fs 
that the sum of the distances between F fs and each F ucm is 
minimized and the sum of the distances between F fs and each 
F - 1 

is maximized. Based on the assumption that the terms in the 
vectors are independent, we can perform optimization on each 
dimension independently, as shown in (3) 

F f, = [j> ) , ff» (w 2 ) , . . . f f$ {w Tl )] T , 

= al ’S pi 11 \ 72 [//*H “ f™:J w )} 2 
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Therefore, there should be a risk to avoid classifying search 
results into too many classes by error. 

We propose the risk as follows: Therefore, there should be a risk 
to avoid classifying search results into too many classes by error. 
We propose the risk as follows: 


Risk 


Ei,j= 


Cl 


( 5 ) 


It calculates the normalized number of clicked URL pairs that 
are not in the same class, where m is the number of the clicked 
URLs. If the pair of the ith clicked URL and the jth clicked URL 
are not categorized into one class, dij will be 1 otherwise, it will 
In the example of Fig. 7b, the lines connect the clicked URL 
pairs and the values of the line reflect whether the two URLs are 
in the same class or not. Then, the risk in Fig. 7b can be 
calculated by: Risk=3/6=1/2.Based on the above discussions, 
we can further extend VAP by introducing the above Risk and 
propose a new criterion “Classified AP,” as shown below 


CAP = VAP x (1 - Bisk) 1 . (6) 

IV. Evaluation Criterion 


To apply the evaluation method to large set of data, the 
single sessions in user click-through logs are used to minimize 
manual work. Because from user click-through logs, we can 
get implicit relevance feedbacks, namely “clicked” means 
relevant and “unclicked” means irrelevant. 

A possible evaluation criterion is the average precision 
(AP)[1] which evaluates according to user implicit feedbacks. 
AP is the average of precisions computed at the point of each 
relevant document in the ranked sequence, as shown in 


I N 7? 

AP = Nrl2 rel W^r' 

r— 1 


( 4 ) 


where N + is the number of clicked documents in the retrieved 
ones, r is the rank, N is the total number of retrieved 
documents, rel() is a binary function on the relevance of a 
given rank, and Rr is the number of relevant retrieved 
documents of rank r or less. 


For example, Fig. 7a is a single session with user’s implicit 
feedback and we can compute AP as: (i/4)X(i/2+2/3+3/7+4/9)=o.5io. 
However, AP is not suitable for evaluating the restructured or 
clustered searching results. The proposed new criterion for 
evaluating restructured results is described in the following. 
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Fig 7 : Illustration for the calculation of AP, VAP, and Risk. 
V. Experiments 


As shown in Fig. 7b, the URLs in the single session are 
restructured into two classes where the un-boldfaced ones in Fig. 
3a are clustered into class 1 and boldfaced ones are clustered 
into class 2. We first introduce “Voted AP (VAP) which is the 
AP of the class including more clicks namely votes. 

However, VAP is still an unsatisfactory criterion. Considering 
an extreme case, if each URL in the click session is categorized 
into one class, VAP will always be the highest value namely 1 no 
matter whether users have so many search goals or not. 


In this section, we will present experiments of our 
proposed algorithm. The data set that we used is based on 
the click through logs from a commercial search engine 
(google. co.in) collected over a period of two months, 
including totally 2,300 different queries, 2.5 million single 
sessions and 2.93 million clicks. On average, each query 
has 1,087 single sessions and 1,274 clicks. However, 
these queries are chosen randomly and they have totally 
different click numbers. Excluding those queries with less 
than five different clicked URLs, we still have 1,520 
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queries. Before using the data sets, some pre-processes are 
implemented to the click-through logs including enriching 
URLs and term processing. 

When clustering feedback sessions of a query, we try five 
different K(l,2,...5) in K-means clustering. Then, we 
restructure the search results according to the inferred user 
search goals and evaluate the performance by CAP, 
respectively. At last, we select K with the highest CAP. 

Before computing CAP, we need to determine 7 in (10). 
We select 20 queries and empirically decide the number of 
user search goals of these queries. Then, we cluster the 
feedback sessions and restructure the search results with 
inferred user search goals. We tune the parameter _ to make 
CAP the highest when K in K-means accord with what we 
expected for most queries. Based on the above process, the 
optimal 7 is from 0.6 to 0.8 for the 20 queries. The mean 
and the variance of the optimal 7 are 0.697 and 0.005, 
respectively. Thus, we set 7 to be 0.7. Moreover, we use 
another 20 queries to compute CAP with the optimal 7 
(0.7) and the result shows that it is proper to set 7 to be 0.7. 
In the following, we will give the comparison between our 
method and the other two methods in restructuring web 
search results. 


VI. Object Evaluation and Comparison 

In this section, we will give the objective evaluation of our 
Search goal inference method and the comparison with other 
two methods. 

Three methods are compared. They are described as follows: 

• Our proposed method clusters feedback sessions to infer 

user search goals. 

• Method I clusters the top 100 search results to infer user 

search goals [6], [20]. First, we program to 

automatically submit the queries to the search engine 
again and crawl the top 100 search results including 
their titles and snippets for each query. Then, each 
search result is mapped to a feature vector according to 
(1) and (2). Finally, we cluster these 100 search results 
of a query to infer user search goals by K-means 
clustering and select the optimal K based on CAP 
criterion. 

• .Method II clusters different clicked URLs directly [18]. 

In user click-through logs, a query has a lot of different 
single sessions; however, the different clicked URLs 
may be few. First, we select these different clicked 
URLs for a query from user click through logs and 
enrich them with these titles and snippets as we do in 
our method. Then, each clicked URL is mapped to a 
feature vector according to (1) and (2). Finally, we 
cluster these different clicked URLs directly to infer 
user search goals as we do in our method and Method I. 
In order to demonstrate that when inferring user search goals, 
clustering our proposed feedback sessions are more efficient 
than clustering search results and clicked URLs directly, we use 
the same framework and clustering method. The only difference 


is that the samples these three methods cluster are different. Note 
that in order to make the format of the data set suitable for 
Method I and Method II, some data reorganization is performed 
to the data set. The performance evaluation and comparison are 
based on the restructuring web search results. 
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Fig. 8. Comparison of three methods for 1,520 queries. Each 
point represents the average Risk and VAP of a query when 
evaluating the performance of restructuring the search results. 

As shown in Fig. 8, we compare three methods for all the 1,520 
queries. Fig. 8a compares our method with Method I and Fig. 8b 
compares ours with Method II. Risk and VAP are used to 
evaluate the performance of restructuring search results 
together. 

Each point in Fig. 8 represents the average Risk and VAP of a 
query. If the search results of a query are restructured properly, 
Risk should be small and VAP should be high and the point 


should tend to be at the top left corner. We can see that the 
points of our method are closer to the top left corner 
comparatively. We compute the mean average VAP, Risk, and 
CAP of all the 1,520 queries as shown in Table 2. 

We can see that the mean average CAP of our method is the 
highest, 8.22 and 3.44 percent higher than Methods I and II 
respectively. The results of Method I are lower than ours due to 
the lack of user feedbacks. However, the results of Method II are 
close to ours. 
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TABLE 2 

CAP Comparison of Three Methods for 1,520 Queries 


Method 

Mean Average VAP 

Mean Average Risk 

Mean Average CAP 

Our Method 

0.755 

0.224 

0.632 

Method I 

0.680 

0.196 

0.584 

Method II 

0.742 

0.243 

0.611 


Below are the experimental results for comparison of three 
methods for 100 most ambiguous queries as shown in Fig (9) 



Fig. 9. Comparison of three methods for 100 most ambiguous 
queries. 


Each point represents the average Risk and VAP of a query when 
evaluating the performance of restructuring the search results. 


TABLE 3 

CAP Comparison of Three Methods for 100 Most Ambiguous Queries 


Method 

Mean Average VAP 

Mean Average Risk 

Mean Average CAP 

Our method 

0.807 

0.159 

0.715 

Method I 

0.583 

0.138 

0.525 

Method II 

0.750 

0.231 

0.624 



Query ID 

Fig lO.The Chart of CAP comparison of three methods for 
100 most ambiguous queries 

VII. ADVANTAGES 


• First, we can restructure web search results ac-cording 

to user search goals by grouping the search results 
with the same search goal; thus, users with different 
search goals can easily find what they want. 

• Second, user search goals represented by some 

keywords can be utilized in query recommendation 
thus, the suggested queries can help users to form 
their queries more precisely. 

• Third, the distributions of user search goals can also 

be useful in applications such as reranking web 
search results that contain different user search 
goals. 


VIII. CONCLUSION 

In this paper, a new metric has been proposed to infer user 
search goals for a query by clustering its feedback sessions 
represented by pseudo -documents. First, we introduce 
feedback sessions to be analyzed to infer user search goals 
rather than search results or clicked URLs. It contains clicked 
URLs and the unclicked ones before the last click and 
corresponding page rank are considered as user implicit 
feedbacks and taken into ac-count to construct feedback 
sessions. Therefore, feedback sessions can reflect user 
information needs more efficiently. 

Second, we map feedback sessions to pseudo documents to 
approximate goal texts in user minds. The pseudo-documents 
can enrich the URLs with additional textual contents 
including the titles and snip-pets. Based on these 
pseudo-documents, user search goals can then be discovered 
and depicted with some keywords. Linally, a 
performance of user search goal inference. The complexity 
of our approach is low and our approach can be used in reality 
easily. Lor each query, the running time depends on the 
number of feedback sessions. 
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